2.1 Summary and EDA Report
Our exploratory data analysis (EDA) of comments and posts pertaining to Dogecoin has yielded significant insights into the temporal frequency of these posts, patterns of user/account activity, and content indicative of buy/sell signals. This analysis reveals a pronounced correlation between the volume of Dogecoin-related posts and price fluctuations throughout 2022, particularly when aggregating monthly counts and prices.
Additionally, a notable divergence in discussions about Dogecoin was observed between the r/Cryptocurrency
and r/Dogecoin
forums. Specifically, as the value of Dogecoin declined in June 2022, the volume of posts and comments within the Dogecoin-specific channel decreased, whereas activity related to Dogecoin on the broader r/Cryptocurrency
forum increased. This shift prompts further investigation into whether such differences may reflect variations in the investment portfolios or mentalities of distinct user cohorts.
Moreover, a detailed examination of user behavior on an individual level—analyzing the times of day and months when users are most active—revealed a peak in posting activity at 16:00 local time. Additionally, the distribution of active users over the months demonstrated a pattern mirroring price movements, suggesting a potential alignment between user engagement and market trends.
Finally, our analysis extended to the investigation of content related to investment strategies, specifically identifying linguistic cues indicative of buy or sell recommendations. It was found that, generally, the prevalence of buy recommendations exceeds that of sell recommendations. This trend aligns with the community’s composition—predominantly enthusiasts and believers in cryptocurrency—where advocating for purchases can be considered strategically sound within a bullish market context, regardless of whether the intent is to sell or hold.
2.2 Data source and cleaning
2.2.1 Data Cleaning
[🔗Github Link]
Cleaning
Before initiating the Exploratory Data Analysis (EDA) phase, it was imperative to confirm that our dataset was clean and structured appropriately. The steps followed are listed as below:
Filter to required subreddits -
r/Dogecoin
andr/Cryptocurrency
.Get all posts from
r/Dogecoin
.Get only posts from
r/Cryptocurrency
which contain ‘doge’ or ‘dogecoin’.
Remove posts with missing values or ‘[deleted]’.
Convert dates from Unix format to a YYYY-mm-dd-hh format (this is needed for time-specific analyses)
Merging
We merged the submissions and comments datasets, based on the post ID, which is stored as ID in the submissions dataset, and link_id in the comments dataset. The entire data cleaning process is documented in the ‘project_eda_cleaning.ipynb’ notebook. After merging, the characteristics of the dataset are listed below.
Summary
The dataset has 587,972 rows and 19 columns. The majority of posts and comments are from r/dogecoin (487037) and the rest are from r/cryptocurrencies
(100935)
Variable list
The schema and the variable types are listed below.
subreddit: string (nullable = true)
subreddit_id: string (nullable = true)
id: string (nullable = true)
created_utc: long (nullable = true)
author: string (nullable = true)
is_self: boolean (nullable = true)
num_comments: long (nullable = true)
score: long (nullable = true)
selftext: string (nullable = true)
title: string (nullable = true)
com_subreddit: string (nullable = true)
com_subreddit_id: string (nullable = true)
com_id: string (nullable = true)
com_created_utc: long (nullable = true)
com_author: string (nullable = true)
com_link_id: string (nullable = true)
com_score: long (nullable = true)
com_body: string (nullable = true)
com_submis_id: string (nullable = true)
Generate New Variables:
We created multiple new variables to use in the analysis, as described below.
Buy signals (buy_sig): If either the post or any of its comments contains any of these keywords:
buy
|bought
|moon
|hold
|call
|bull
|like
|yolo
Contains ‘doge| dogecoin’: If a post/comment mentions the word ‘doge’
Post activity per minute (hour): The average number of comments made on a post per minuter (hour). Divide the total number of comments by the duration between the timestamp when the post was created and the timestamp of the last comment on the post.
Day, month and hour: As described above Convert utc_time to
yyyy-mm-dd-hh
Percentage of post of
r/dogecoin
(pct_post_rdoge): Proportion of post in different subreddits
2.2.2 Price Query
[🔗Github Link]
Utilizing the CryptoCompare API, this research project has meticulously collected hourly price data for both Bitcoin and Dogecoin throughout 2023, culminating in a dataset of 8,762 entries. The data retrieval process was divided into six sessions to adhere to the API’s limitation of 2,000 requests per session. Given the substantial disparity in the absolute values of Bitcoin and Dogecoin prices, the visualization employs dual y-axes to facilitate a clearer comparative analysis. This methodological choice allows for an insightful examination of the respective price trajectories within the same graphical representation.
Dogecoin vs. Bitcoin Price
The analysis reveals that both Bitcoin and Dogecoin experienced a decline in value in 2023, coinciding with the broader transition from a bullish to a bearish market within the cryptocurrency domain. Notably, Dogecoin exhibited greater volatility compared to Bitcoin. This heightened fluctuation can be attributed to Dogecoin’s valuation being significantly influenced by community sentiment rather than intrinsic economic factors. A particularly intriguing observation was Dogecoin’s price surge during the FTX crisis, suggesting potential responsiveness to specific market events.
Dogecoin vs. Bitcoin Growth Rate
To quantify the observed trends, we computed the growth rate based on periodic differences. This calculation reinforces the preliminary findings, highlighting Dogecoin’s pronounced susceptibility to fluctuations in response to market events, such as regulatory changes or major scandals. The comparative analysis underscores the distinct behavioral patterns of Bitcoin and Dogecoin within the same market conditions, offering valuable insights into the dynamics of cryptocurrency markets. This research contributes to the academic discourse by elucidating the factors driving volatility in digital currencies, with a particular focus on the influence of community engagement and external events on market behavior.
2.3 Q1: Exploring frequency of posts [🔗Github Link]
2.3.1. How has the volume of discussion of Dogecoin on reddits
changed over the time?***
Let’s take a look at which month get the most discussion of dogecoin in the summary table below.
Year-month | Count |
---|---|
2022-01 | 97339 |
2022-02 | 101368 |
2022-03 | 83904 |
2022-04 | 79935 |
2022-05 | 16987 |
2022-06 | 29272 |
2022-07 | 19402 |
2022-08 | 9455 |
2022-09 | 16497 |
2022-10 | 20306 |
2022-11 | 53971 |
2022-12 | 35109 |
2023-01 | 24427 |
2.3.3 Is there a difference in different subreddits?
The plot shoes the difference between r/Cryptocurrency
and r/Dogecoin
. First we find that most discussion about Dogecoin are inr/Dogecoin
but not the r/Cryptocurrency
. This is probably because in bull market. Compared to Dogecoin, mainstream crypto like bitcoin and etherum are more attractive to common crypto investors, while only Dogecoin fanatics will persist on Dogecoin when other coins are also good invest option.
Another interesting finding is that in Jun 2022, when the Dogecoin price bounce back there is a decreasement in r/Dogecoin
discussion while the see a resume of talking in r/Cryptocurrency
. Probably because the bounce back is not as strong as Dogecoin
community expected while investors not keep following Dogecoin after its plunge would see this as an opportunity.
2.4 Q2: Exploring user/account patterns [🔗Github Link}
2.4.1 What are the patterns of user activity on the subreddits of interest?
There are 87,743 unique authors in r/Dogecoin
and the specific selection of r/Cryptocurrency
that we considered. ‘Users’ include any user who has authored a post or a comment on these subreddits. Of these users 2,782 users were active in both the subreddits.
r/subreddit | Number of Users | Posts Per User | Comments Per User | Average Score Per Post by User | Average Score Per Comment by User |
r/CryptoCurrency only | 23,007 | 0.06 | 2.94 | 2.69 | 2.99 |
r/dogecoin only | 61,954 | 0.55 | 4.00 | 10.02 | 2.93 |
Both (r/CryptoCurrency + r/dogecoin) | 2,782 | 4.26 | 92.33 | 16.40 | 2.99 |
Breaking down users by their affiliation to the different subreddits, there are distinct engagement patterns. The users who are part of both the subreddits lead across almost all metrics indicating that they are very enthusiastic about the prospects of Dogecoin and drive conversation about it across channels. Posts generated by these users are distinctively more appreciated as compared to those generated by users who are dedicated to a single subreddit.
2.4.2 Are there any characteristics of when users engage on the platform
We proceed to evaluate the nature of posting behavior over time. This sort of analysis is helpful to understand if there are any particular windows of a day where activity levels are higher or lower. Our analysis shows that there are similarities in the trends of when posts and comments are made in the subreddits. Across the 24 hour window in a day, posts and comments have shown a distinct peak and a distinct slump. Peak activities were witnessed across 14:00 to 20:00 UTC while the slump can be seen from 06:00 to 12:00 UTC. Though we do not have additional information of the users local timezones the usage patterns may help make guesses about the local timezones of clusters of users.
Activity can be estimated based on the tagged signals of engagement by a user. These can be derived from the data about either publishing a ‘post’ or ‘comment’ by the user. Based on this analysis, we see higher levels in the number of active users from January to April in 2022 and then slump to low levels before attempting to pick up towards the end of 2022. These levels directly follow patterns seen in the volume of posts and comments. Segmenting users into the number of days of being active show us how they remain engaged with the conversations on the platform. We can see indicators of short-term engagement for the majority of users. These users have typically stayed engaged for less than 5 days across the observation period.
2.5 Q3 Analyzing content - buy recommendations [🔗Github Link]
We follow an approach used in a paper by Chacon et al (2022), where they try to study the effect of ‘buy’ recommendations in submissions in the r/WallStreetBets
subreddit on actual stock performance. In the same approach, we generate a binary variable named buy_signal which is equal to 1 if the post contains one of these words: buy, bought, moon, hold, call, bull, like, moon, and yolo.
2.5.1. How do buy recommendations change as % of posts when activity increases in subreddits?
As the above time series plot shows, on average in 2022, usually around 10 and 25 % of posts in r/dogecoin
and in r/cryptocurrency
(those which mention dogecoin) contain words in the ‘buy words’ list defined above. However, this suddenly jumps during some periods like June - August 2022, when there were days in which more than 50% of posts/comments made by users had a ‘buy’ signal.
2.5.2. What is the average score of posts which contain buy signals for doge?
The average score of posts containing buy signals is much higher in both subreddits. The baseline levels are higher in r/dogecoin, reflecting the higher ingrained affiliation towards the coin. We confirm that the difference in scores is statistically significant by conducting a t-test between the samples, the results of which are given in the below table.
Subreddit | Buy Signal Present | Mean Score | p-value |
---|---|---|---|
r/Dogecoin | No (0) | 56.03 | 4.37e-52 |
r/Dogecoin | Yes (1) | 101.65 | 4.37e-52 |
r/CryptoCurrency | No (0) | 30.08 | 0.000144 |
r/CryptoCurrency | Yes (1) | 70.91 | 0.000144 |